Exerting Cost-Sensitive and Feature Creation Algorithms for Coronary Artery Disease Diagnosis

نویسندگان

  • Roohallah Alizadehsani
  • Mohammad Javad Hosseini
  • Reihane Boghrati
  • Asma Ghandeharioun
  • Fahime Khozeimeh
  • Zahra Alizadeh Sani
چکیده

One of the main causes of death the world over is the family of cardiovascular diseases, of which coronary artery disease (CAD) is a major type. Angiography is the principal diagnostic modality for the stenosis of heart arteries; however, it leads to high complications and costs. The present study conducted data-mining algorithms on the Z-Alizadeh Sani dataset, so as to investigate rule based and feature based classifiers and their comparison, and the reason for the effectiveness of a preprocessing algorithm on a dataset. Misclassification of diseased patients has more side effects than that of healthy ones. To this end, this paper employs 10-fold cross-validation on cost-sensitive algorithms along with base classifiers of Naïve Bayes, Sequential Minimal Optimization (SMO), K-Nearest Neighbors (KNN), Support Vector Machine (SVM), and C4.5 and the results show that the SMO algorithm yielded very high sensitivity (97.22%) and accuracy (92.09%) rates. DOI: 10.4018/jkdb.2012010104 60 International Journal of Knowledge Discovery in Bioinformatics, 3(1), 59-79, January-March 2012 Copyright © 2012, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. INTRODUCTION The morality rates from diseases are much greater than those of accidents and natural disasters. The World Health Organization estimates that 17 million deaths worldwide each year occur due to cardiovascular diseases (Bonow, Mann, Zipes, & Libby, 2012). A major type of such diseases is coronary artery disease (CAD), which is reported to account for 7 million deaths over the world per annum (Bonow et al., 2012). Mining is the extraction of knowledge from a set of data. In other words, data mining is a process that uses intelligent techniques whereby knowledge of a set of data can be extracted (Bickel & Scheffer, 2004). Angiography is the modality of choice for the diagnosis of CAD. Angiography determines the location and extent of the stenotic arteries; nevertheless, its high costs and risks for the patient have prompted researchers to seek less expensive and more effective methods with the aid of data mining. Moreover, cost-sensitive algorithms can be of huge value in this field as misclassification of diseased or healthy patients has different costs. Pedreira et al. (2005), using the Neural Network on UCI (UC Irvine Machine Learning Repository, 2012) datasets, attained an accuracy rate of 80% for CAD diagnosis. Das et al. (2009) applied the Neural Network on the datasets of Cleveland (UC Irvine Machine Learning Repository, 2012) and reported an accuracy rate of 89.01%. Babaoglu et al. (2010) utilized the Support Vector Machine (SVM) algorithm on an exercise test data and achieved an accuracy rate of 79.17%. Tsipouras et al. (2008) used the Fuzzy Model to detect CAD. Itchhaporia et al. (1995) drew upon the Neural Network to analyze an exercise test data for the diagnosis of CAD. Polat et al. (2007) by using fuzzy systems and KNN reached the accuracy of 87% for CAD diagnosis. Alizadehsani et al. (2012) proposed a new ensemble algorithm which diagnoses CAD by 88.5% accuracy. Lee et al. (2008) used Heart Rate Variability (HRV) features for diagnosing CAD. Karaolis et al. (2010) and Snirivas et al. (2010) used C4.5 and naïve bayes algorithm respectively to diagnose CAD. One of the purposes of the present study was to investigate rule based classifiers for CAD diagnosis. Resulted in low specificity rule based classifiers, other methods were sought in this paper. We use MetaCost, which is a cost-sensitive (Domingos, 1999) algorithm, so as to distinguish CAD patients from healthy individuals. The Sequential Minimal Optimization (SMO) (Platt, 1998), Naïve Bayes (Caruana, & NiculescuMizil, 2006), C4.5 (Quinlan, 1996), Support Vector Machine (SVM) (Ben-Hur & Weston, 2010), and K-Nearest Neighbors (KNN) (Larose, 2005) algorithms were employed to analyze the Z-Alizadeh Sani dataset with no feature normalization. The performance of all the mentioned algorithms was calculated using 10-fold cross-validation. This dataset contains information on 303 random visitors to Rajaei Cardiovascular, Medical and Research Center in Tehran, Iran. The dataset was enriched with three created features extracted from the other features prior to the application of the costsensitive algorithms on the datasets. The effect of the created features was investigated both theoretically and practically. First, an assumption was made about the created features. Then a lemma was stated to provide a subset of sample which satisfied the assumption. Afterwards, another lemma was presented using assumption 1 in order to discuss the effectiveness of the created features. In the experiments, the correctness of assumption 1 and the effectiveness of the created features were studied. As a result, high rates of both accuracy and sensitivity were obtained which, to the best of our knowledge, are superior to the existing studies in this area. International Journal of Knowledge Discovery in Bioinformatics, 3(1), 59-79, January-March 2012 61 Copyright © 2012, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. The rest of this paper describes the medical dataset, the used data-mining methods, the reason for the effectiveness of the proposed method, methods evaluation, conclusion and future research directions, respectively. USED MEDICAL DATASET The Z-Alizadeh Sani dataset is collected from 303 random visitors to Rajaei Cardiovascular, Medical and Research Center, Tehran, Iran, and contains 54 features (Alizadehsani et al., n.d.). The features along with their valid ranges are depicted in Table 1 through Table 4. The details of the features of Table 1 through Table 4 and how much they influenced CAD can be found in Alizadehsani et al. (n.d.). The discretization ranges provided in Braunwald’s Heart Book (Bonow et al., 2012) were used, and some additional features were added to the dataset and they are introduced in Index 2. Some of these categories are given in Table 5.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

بررسی شیوع هم‌زمانی عوارض میکروواسکولار و بیماری عروق کرونر در بیماران مبتلا به دیابت تیپ II

Abstract Abstract: Diabetic Mellitus (DM) is a systemic disease that affects all body organs. Micro and macro vascular complications progress with diabetes progression. It is important to find a solution for early diagnosis of coronary artery disease that is a major cause of death in these patients. The goal of this study is to assess the relation between diabetic nephropathy and retinopathy to...

متن کامل

Diagnosis of Coronary Artery Disease via a Novel Fuzzy Expert System Optimized by Cuckoo Search

In this paper, we propose a novel fuzzy expert system for detection of Coronary Artery Disease, using cuckoo search algorithm. This system includes three phases: firstly, at the stage of fuzzy system design, a decision tree is used to extract if-then rules which provide the crisp rules required for Coronary Artery Disease detection. Secondly, the fuzzy system is formed by setting the intervals ...

متن کامل

Diagnosis of Coronary Artery Disease using Neuro-fuzzy-based Method

Background & Aim: Coronary artery disease is one of the most common diseases in different societies. Coronary angiography is established as one of the best methods for diagnosis of this disease. Angiography is an invasive and costly method. Furthermore, it is associated with risks such as death, heart attack, and stroke. Thus, this study introduces a neuro-fuzzy-based method which can help the ...

متن کامل

Diagnostic Accuracy of Gated-SPECT Myocardial Perfusion Imaging and Exercise Stress Test for Diagnosis of Coronary Artery Disease

Background and Objective:Accurate diagnosis of CAD using noninvasive procedures is of great importance. The aim of the study is to assess diagnostic accuracy of myocardial perfusion imaging and compare it with exercise stress test in order to decipher a more accurate and cost-effective method for CAD detection. Materials and Methods:Of 430 consecutive patients suspected with CAD, 104 performed...

متن کامل

A Random Forest Classifier based on Genetic Algorithm for Cardiovascular Diseases Diagnosis (RESEARCH NOTE)

Machine learning-based classification techniques provide support for the decision making process in the field of healthcare, especially in disease diagnosis, prognosis and screening. Healthcare datasets are voluminous in nature and their high dimensionality problem comprises in terms of slower learning rate and higher computational cost. Feature selection is expected to deal with the high dimen...

متن کامل

بررسی اندکس پایی-بازویی بعنوان عامل پیشگویی کننده در تشخیص بیماری عروق کرونر بیمارستان امام، 83-1382

Background and Aim: Cardiovascular disease is one of the main causes of mortality and morbidity around the world and because of insidious and chronic progression of arterioscleroses and coronary artery disease (CAD) and also correlation between peripheral arterial disease and CAD we evaluated ankle brachial index (ABI) as a predictive factor for early diagnosis of CAD. Materials and Methods: Ev...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • IJKDB

دوره 3  شماره 

صفحات  -

تاریخ انتشار 2012